Cluster-Computing and Parallelization for the Multi-Dimensional PH-Index Master Thesis

نویسندگان

  • Bogdan Aurel Vancea
  • Moira C. Norrie
  • Tilmann Zäschke
  • Christoph Zimmerli
چکیده

The storage and management of multi-dimensional data is an important aspect in many applications, like geo-information systems, computer vision and computer geometry. In contemporary times, when computers are able to capture and store increasing amounts of multidimensional data, it is important for applications to efficiently store and query this data. This work presents a distributed version of the PH-tree, a highly efficient in-memory multidimensional data structure, supporting range and nearest neighbour queries. We present a distribution architecture for the PH-tree, which extends it to run on a cluster of computers. The distributed version of the PH-tree is able to use the main-memory of all of machines in the cluster to store the multi-dimensional data. Moreover, the distributed setting allows each machine to be queried independently of the other machines. The point distribution algorithm proposed in this work uses the Z-order space filling curve to assign sections of the space to the computers in the cluster. Additionally, we present an automatic data rebalancing algorithm, which attempts to maintain an equal storage load across all computers. The performance evaluation shows that the proposed distribution version of the PH-tree is able to scale with respect to the number of computer in the cluster, obtaining an almost linear increase in throughput and a similar reduction in the response time for point operations. Another contribution of this work is the extension of the PH-tree to support concurrent write access, allowing it to take advantage of multi-core processor architectures. We present several concurrent write access strategies, with different consistency guarantees, discuss their advantages and disadvantages and evaluate the write performance. The presented copy-onwrite strategy allows queries to execute on snapshots of the PH-tree, while the PH-tree itself is accessed by one writer and multiple readers. This work also presents two fine grained locking strategies, which sacrifice consistency to allow multiple reader and writer threads to access the tree in the same time. All presented concurrency strategies perform locking only in the case of the write operations, and allow read queries to execute without taking any locks, which greatly increases the performance of read operations for large numbers of reader threads.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

Parallel Spatial Pyramid Match Kernel Algorithm for Object Recognition using a Cluster of Computers

This paper parallelizes the spatial pyramid match kernel (SPK) implementation. SPK is one of the most usable kernel methods, along with support vector machine classifier, with high accuracy in object recognition. MATLAB parallel computing toolbox has been used to parallelize SPK. In this implementation, MATLAB Message Passing Interface (MPI) functions and features included in the toolbox help u...

متن کامل

Parallelizing multidimensional indexes for main memory databases

Parallelizing multidimensional indexes for main memory databases Master thesis,

متن کامل

Asynchronous Master-Slave Parallelization of Differential Evolution for Multi-Objective Optimization

In this paper, we present AMS-DEMO, an asynchronous master-slave implementation of DEMO, an evolutionary algorithm for multi-objective optimization. AMS-DEMO was designed for solving time-intensive problems efficiently on both homogeneous and heterogeneous parallel computer architectures. The algorithm is used as a test case for the asynchronous master-slave parallelization of multi-objective o...

متن کامل

Parallelization of K-Means Clustering on Multi-Core Processors

Multi-core processors have recently been available on most personal computers. To get the maximum benefit of computational power from the multi-core architecture, we need a new design on existing algorithms and software. In this paper we propose the parallelization of the well-known k-means clustering algorithm. We employ a single program multiple data (SPMD) approach based on a message passing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015